Adding Token Counting to Directory-Based Cache Coherence
نویسندگان
چکیده
The coherence protocol is a first-order design concern in multicore designs. Directory protocols are naturally scalable, as they place no restrictions on the interconnect and have minimal bandwidth requirements; however, this scalability comes at the cost of increased sharing latency due to indirection. In contrast, broadcast-based systems such as snooping protocols and token coherence reduce latency of sharing misses by sending requests directly to other processors. Unfortunately, their reliance on totally ordered interconnects and/or broadcast limits their scalability. This work introduces PATCH (Predictive/Adaptive Token Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically using available bandwidth to reduce sharing latency. PATCH extends a standard directory protocol to track tokens and use token counting rules for enforcing coherence permissions. Token counting allows PATCH to support direct requests on an unordered interconnect, while a novel mechanism called token tenure uses local processor timeouts and the directory’s per-block point of ordering at the home node to guarantee forward progress without relying on broadcast. PATCH makes three main contributions. First, PATCH uses direct request prioritization to match the performance of broadcast-based protocols without restricting scalability. Second, PATCH introduces token tenure, which provides broadcast-free forward progress for token counting protocols. Finally, PATCH provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, PATCH is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in between Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-08-22. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/882 Adding Token Counting to Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo M. K. Martin University of Pennsylvania UPenn CIS Technical Report TR-CIS-08-22 June 4, 2008 Abstract The coherence protocol is a first-order design concern in multicore designs. Directory protocols are naturally scalable, as they place no restrictions on the interconnect and have minimal bandwidth requirements; however, this scalability comes at the cost of increased sharing latency due to indirection. In contrast, broadcast-based systems such as snooping protocols and token coherence reduce latency of sharing misses by sending requests directly to other processors. Unfortunately, their reliance on totally ordered interconnects and/or broadcast limits their scalability. This work introduces PATCH (Predictive/Adaptive Token Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically using available bandwidth to reduce sharing latency. PATCH extends a standard directory protocol to track tokens and use token counting rules for enforcing coherence permissions. Token counting allows PATCH to support direct requests on an unordered interconnect, while a novel mechanism called token tenure uses local processor timeouts and the directory’s per-block point of ordering at the home node to guarantee forward progress without relying on broadcast. PATCH makes three main contributions. First, PATCH uses direct request prioritization to match the performance of broadcast-based protocols without restricting scalability. Second, PATCH introduces token tenure, which provides broadcast-free forward progress for token counting protocols. Finally, PATCH provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, PATCH is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in between.The coherence protocol is a first-order design concern in multicore designs. Directory protocols are naturally scalable, as they place no restrictions on the interconnect and have minimal bandwidth requirements; however, this scalability comes at the cost of increased sharing latency due to indirection. In contrast, broadcast-based systems such as snooping protocols and token coherence reduce latency of sharing misses by sending requests directly to other processors. Unfortunately, their reliance on totally ordered interconnects and/or broadcast limits their scalability. This work introduces PATCH (Predictive/Adaptive Token Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically using available bandwidth to reduce sharing latency. PATCH extends a standard directory protocol to track tokens and use token counting rules for enforcing coherence permissions. Token counting allows PATCH to support direct requests on an unordered interconnect, while a novel mechanism called token tenure uses local processor timeouts and the directory’s per-block point of ordering at the home node to guarantee forward progress without relying on broadcast. PATCH makes three main contributions. First, PATCH uses direct request prioritization to match the performance of broadcast-based protocols without restricting scalability. Second, PATCH introduces token tenure, which provides broadcast-free forward progress for token counting protocols. Finally, PATCH provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, PATCH is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in between.
منابع مشابه
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs
In many-core CMP architectures, the cache coherence protocol is a key component since it can add requirements of area and power consumption to the final design and, therefore, it could restrict severely its scalability. Area constraints limit the use of precise sharing codes to smallor medium-scale CMPs. Power constraints make impractical to use broadcast-based protocols for large-scale CMPs. T...
متن کاملTwo proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors
In glueless shared-memory multiprocessors where cache coherence is usually maintained using a directory-based protocol, the fast access to the on-chip components (caches and network router, among others) contrasts with the much slower main memory. Unfortunately, directory-based protocols need to obtain the sharing status of every memory block before coherence actions can be performed. This info...
متن کاملToken Coherence: Low-Latency Coherence on Unordered Interconnects
Future shared-memory multiprocessor servers will target commercial workloads using highly-integrated “glueless” designs. Commercial workloads, which exhibit frequent sharing misses, benefit from the direct communication of snooping protocols. Unfortunately, snooping systems require a totally-ordered interconnect, which is difficult to efficiently implement in glueless designs. The standard alte...
متن کاملToken Coherence: A New Framework for Shared-Memory Multiprocessors
Commercial workload and technology trends are pushing existing shared-memory multiprocessor coherence protocols in divergent directions. Token Coherence provides a framework for new coherence protocols that can reconcile these opposing trends. Comments Copyright 2003 IEEE. Reprinted from IEEE Micro, Volume 23, Issue 6, 2003, pages 108-116. This material is posted here with permission of the IEE...
متن کاملModelling Accesses to Stationary Data in a Shared Memory
Cache misses due to coherence and directory maintenance is a major reason for poor performance in shared memory multiprocessors. We show that the relationship between a particular access pattern and cache miss ratios for a class of directory-based, write-invalidate cache coherence protocols can be characterised in a small set of parameters. In order to do this, a reference generator has been de...
متن کامل